Multi-level post-processing for Korean character recognition using morphological analysis and linguistic evaluation

نویسندگان

  • Gary Geunbae Lee
  • Jong-Hyeok Lee
  • JinHee Yoo
چکیده

Most of the post-processing methods for character recognition rely on contextual information of character and word-fragment levels. However, due to linguistic characteristics of Korean, such low-level information alone is not sufficient for high-quality character-recognition applications, and we need much higher-level contextual information to improve the recognition results. This paper presents a domain independent post-processing technique that utilizes multi-level morphological, syntactic, and semantic information as well as character-level information. The proposed post-processing system performs three-level processing: candidate character-set selection, candidate eojeol (Korean word) generation through morphological analysis, and final single eojeol-sequence selection by linguistic evaluation. All the required linguistic information and probabilities are automatically acquired from a statistical corpus analysis. Experimental results demonstrate the effectiveness of our method, yielding error correction rate of 80.46%, and improved recognition rate of 95.53% from before-post-processing rate 71.2% for single bestsolution selection.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Contextual Post-processing Model for Korean OCR using Synthesized Statistical Information

In this paper, we describe a contextual Korean OCR post-processing model considering unknown words. This work starts from the following premises: 1) In the language having very large character set, it is hard to directly correct erroneous string; 2) word formation is deeply related not only to morphological feature but also to phonological feature(esp. syllable combination on the surface level)...

متن کامل

بازشناسی برخط حروف مجزای دست‌نویس فارسی بر اساس تشخیص گروه بدنه اصلی با استفاده از ماشین بردار پشتیبان

In this paper a new method for the online recognition of handwritten Persian characters has been proposed which uses a set of simple features and Support Vector Machine (SVM) as a classifier. The task of preprocessing allows us to equalize feature vectors from different characters. This algorithm is implemented in two steps. In the first step, input character is classified into one of eighteen ...

متن کامل

Learning the lexicon from raw texts for open-vocabulary Korean word recognition

In this paper, we propose a novel method of building a language model for open-vocabulary Korean word recognition. Due to the complex morphology of Korean, it is inappropriate to use lexicons based on the linguistic entities such as words and morphemes in openvocabulary domains. Instead, we build the lexicon by collecting variable length character sequences from the raw texts using a dynamic Ba...

متن کامل

Integrating connectionist, statistical and symbolic approaches for continuous spoken Korean processing

This paper presents a multi-strategic and hybrid approach for large-scale integrated speech and natural language processing, employing connectionist, statistical and symbolic techniques. The developed spoken Korean processing engine (SKOPE) integrates connectionist TDNN-based phoneme recognition technique with statistical Viterbi-based lexical decoding and symbolic morphological/phonological an...

متن کامل

SKOPE: A connectionist/symbolic architecture of spoken Korean processing

Spoken language processing requires speech and natural language integration. Moreover, spoken Korean calls for unique processing methodology due to its linguistic characteristics. This paper presents SKOPE, a connectionist/symbolic spoken Korean processing engine, which emphasizes that: 1) connectionist and symbolic techniques must be selectively applied according to their relative strength and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pattern Recognition

دوره 30  شماره 

صفحات  -

تاریخ انتشار 1997